The 6. Related Work 7. Conclusions 5.4. Effects of Processor Speed Figure 7. Sc-coma's Slowdown for Different Processor Speeds (75% Pressure) Table 3: Asymptotic Slowdown for Sc-coma Figure 8. Execution times for Sc-coma and Hw- Coma with Varying Processor Speeds

نویسندگان

  • J. Kubiatowicz
  • D. Chaiken
چکیده

protocol engine occupancy, relative to measurements in FLASH [10], is reduced by several effects. The number of accesses hitting in the local memory is maximized by the COMA organization and such accesses bypass the protocol engine, generating zero occupancy, unlike in FLASH. When the main processor is blocked in a remote access or even for synchronization, servicing external requests also produces zero occupancy. The performance of SC-COMA compares favorably with an idealized hardware-implemented COMA. Execution times on 32 nodes for six benchmarks indicate a slowdown of 11-37% at 25% memory pressure and 21-56% at 75% memory pressure. SC-COMA scales well up to 32 processors. Our investigation on the effects of faster processors, relative to memory and network speeds, revealed that SC-COMA's slowdown is reduced as the overall contribution of the software overhead to the remote latency shrinks. However, the slowdown cannot pass below a certain threshold due to SC-COMA's loose integration of the protocol engine with the network interface and the Access Checking Device. The results we have presented are encouraging given the simplicity of the current protocol. We can expect improvements from further optimizations and extensions, such as ownership hints [1], application-specific protocols, and adaptive sequential prefetching. execution times for some benchmarks using processors clocked up to 1GHz. The results for the other three benchmarks are similar to Barnes and LU. The curves for LU show that a 275 MHz SC-COMA performs like a 200 MHz HW-COMA. The same is true for a 450 MHz SC-COMA and a 300 MHz HW-COMA. In essence, this indicates that SC-COMA could be a very viable solution for the present and near-future. The release of a hardware COMA using 300 MHz processors could take as long as the development of a next-generation, 450 MHz processor, which can be used immediately by a software COMA, at virtually no costs. The current version of SC-COMA runs on a sequentially consistent hardware. Thus, the store buffers are disabled. In Figure 8, a third curve is plotted for a HW-COMA where all write stalls have been eliminated. At 200 MHz, SC-COMA shows a slowdown between 1.36 for LU and 3.25 for Radix, as compared to this HW-COMA with ideal release consistency. However, the software protocol in SC-COMA could be upgraded as well to run on a release consistent substrate. This would involve the software ability to recover pending stores from the buffer (address and data), after they …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

8. Related Work

A low overhead, software-only approach for supporting fine-grain shared memory. 20 We have compared three hybrid architectures: SCC-NUMA, SC-COMA and SS-NUMA, the hybrid version of the Simple COMA in which memory is allocated at the page level at replication time. Basically we show that, with some additional hardware support, SC-COMA with a memory associativity of four reaches the best (or clos...

متن کامل

Ultra-Low-Energy DSP Processor Design for Many-Core Parallel Applications

Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...

متن کامل

An Effective Hybrid Genetic Algorithm for Hybrid Flow Shops with Sequence Dependent Setup Times and Processor Blocking

Hybrid flow-shop or flexible flow shop problems have remained subject of intensive research over several years. Hybrid flow-shop problems overcome one of the limitations of the classical flow-shop model by allowing parallel processors at each stage of task processing. In many papers the assumptions are generally made that there is unlimited storage available between stages and the setup times a...

متن کامل

Computing Static Slowdown Factors under EDF Scheduling when Deadline less than Period

Slowdown factors determine the extent of slowdown a computing system can experience based on functional and performance requirements. Dynamic Voltage Scaling (DVS) of a processor based on slowdown factors can lead to considerable energy savings. This paper describes computation of slowdown factor for a task set with an underlying dynamic priority scheduler such as Earliest Deadline First (EDF) ...

متن کامل

BOLD responses in the superior colliculus and lateral geniculate nucleus of the rat viewing an apparent motion stimulus

In rats, the superior colliculus (SC) is a main destination for retinal ganglion cells and is an important subcortical structure for vision. Electrophysiology studies have observed that many SC neurons are highly sensitive to moving objects, but complementary non-invasive functional imaging studies with larger fields of view have been rarely conducted. In this study, BOLD fMRI is used to measur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997